Switching Characteristics of Generalized Array Multiplier Architectures and their Applications to Low Power Design

نویسندگان

  • Khurram Muhammad
  • Dinesh Somasekhar
  • Kaushik Roy
چکیده

This paper presents several new array multiplier architectures for reducing the switching activity in general digital signal processing applications. A general cellular structure is described which can be used to obtain any array multiplier suitable for a given application. The switching activity at the output nodes of the cells in this structure is analyzed and compared with a tree multiplier based on 4 : 2 compressors. It is shown that the relative inlprovement in power is a function of statistical properties of the signal. It is also shown that selection of appropriate array architecture can give up to 40% reduction in switching activity compared to a tree multiplier, and more than 3 times less switching activity compared to the widely used least-szgnzficant-bzt-first array multiplier for commonly occurring situations. We also outline applications of the proposed multipliers to the areas of low power quantization, reconfigu~.able computing and high-level synthesis for low power. 'This work was supported in part by DARPA (F33615-95-C-1625), NSF CAREER award (9501869-MIP), Rockwell, AT&T and Lucent foundation. With the recent trend in increasing mobility and performance in small hand-held mobile communicat,ion and portable computing equipment, low power has become an important design factor. New features are continually provided using DSP algorithms which are dominated by three basic operations; add, shaft and multiply. Many DSP algorithms can be implemented such that the data is processed in carry save (CS) format. as this format yields zero cost of accumulation [I] in multzply-and-accumulate (MAC) operation. The conversis3n of the result to normal binary forrn can be delayed for as long as possible for the given algorithm since it results in a significantly faster implementation. Consider, for example, a digital filter implementat:.on. In such an application, the intermediate result which is the accumulatior~ of a given inner product of d,sta and the coefficient can be kept stored in CS format, with the CS to binary conversion taking place only after the final result is computed in CS form. Consequently, multiplier architectures processing data in CS format are of particular interest. Multiplica1,ion operations are considered to be the dominant computation in DSP algorithms [2], [3]. Since, computation directly results in dynamic power consumption [4] it is an equally important factor when considering dynamic power dissipation of such algorithms. In general, high-performance DSP architectures aire required in mobile unit,s which process data at high transmission rates, or in a port,able computer providing advance multimedia features. For this reason, such units are generally constructed with pipelined array m~lt~ipliers. If the latency of t,he pipelined architecture is an important consideration, a pipeli.ned tree multiplier can be used. Both types of multipliers can be easily pipelined using the conventional register based approach, or by using wave pipelining. Over t,he past few years, a number of papers have addressed multiplier topologies for a variety of applications [I], [6], [7]. In particular, array structures prl3posed in [6] address pipelining of recursive digital filters using most signijicant bit (MSB) first digit serial arithmetic. However, to the best of our knowledge, no work has been reported in literature which address dynamic switching activity trade-offs between popular multiplier architectuires as a function of statistical properties of inputs. In this paper, we esplore array structures from the point of view of dynamic power dissipation. Contrary to the expectation that any ordering of array multiplier would yield similar dynamic power dissipation performance, we will show that more than 3 times reduction in switching activity may be possible compared to t,he commonly used least significant bit (LSB) first array multipliers (also known as right-left multipliers), depending on the signal characteristic of input signals. This is because a salient feature of computation in DSP algorithms is that the computations are governed by the statistical properties of the underlying process generating dat,a. In general, data signals are correlated and consequently, rapid crhanging data is seldom processed. Hence, we will explore the effects of signal statistics on the output swit,ching activit,y in various array structures in order to assess the feasibility of using a given structure under the condition of known or predictable signal statistics. We will show that re-ordering of partial product addition can result in significant reduction in switching activity (hence, dynamic power) if the signal statistics are known a przori. This observation leads to new array multiplier architectures which form hybrids of MSB-first and LSB-first strl~ctures. We also discuss the application of such multipliers to low power iniplementation of DSP algorithms and to the general area of reconfigurable computing. The main objective of this work is to identify what type of architectures are best suited for processing signals with known statistical properties for reduced dynamic power dissipation? There are three major contributions: of this work: r We propose hybrid-array structures which combine LSB-first and MSB-first types of array multipliers. For appropriate signal conditions, these structures are shown to significantly reduce dynamic power dissipation. r The switching characteristics of array multipliers are compared with a tree multiplier based on 4 : 2 compres:jors as well as the most commonly used LSB-first multiplier to show the region of strength of each zrchitecture. Hence, this work can be used to formulate an appropriate strategy for selecting the best order of partial product addition for reducing power dissipation in a given LISP task. Alternatively, when processing signals with known statistical properties, one can formulate a strategy for applying signals to the multiplier inputs in an order which most effectively reduces dynamic power dissipatilsn. r The architectures presented in this paper provide new insights to the general area of low power design and reconfigurable computing. This paper is organized in to five sections. Section I1 describes the array multiplier architectues considered in this work. Section I11 presents a simulation based study of the switching characteristics of output nodes in the architectures considered. The signal models used to compute the performance of these multipliers are also explained in this section. Section IV discusses the applications of these strucl ures to general signal procesr;ing algorithms. Finally, section V concludes this paper. We will f in t present a simple approach for obtaining various types of array multipliers. Figure 1 shows a template for a cellular array structure which serves as the basis for generating different types of 8-bit array multipliers. Each location in this matrix can be occupied by a cell which can be an a.nd gate (AND), a half a d d e r :H.4) or a full a d d e r (F.4). In the sequel, the cell at location i, j will be referred to as ci,j. As an example, the cells on four corners are shown labeled in the figure. Let A = ao, a , , . . . , a ~ 1 and B = bo , b l , . . . , b N P l represent the input vectors applied at right and top, respectively. The output is represented by P = po, p l , . . . , p 2 ~ ~ . Then each partial product ai . b j , where i, j = 0 , 1 , . . . , N 1 must be added in the appropriate relative position to obtain the correct value of P. In figure 1 we have shown the structure of LSB-first type array multiplier by the colored cells comprising a parallelogram. In this figure, the continuous lines show presence of connections, while the dashed lines show absence of them. Hence, the aztive connections in a CS type of array multiplier are shown using the contii~uous lines. The connections i iom primary inputs to appropriate cells are not shown explicitly, and are assumed implicit t o reduce clutter. By counting the number of active inputs, one can determine the type of cell. Hence, the cells in row #O are all AND gates, whereas the seven rightmost cells in row # I are HAS. The cells accepting three active inputs are FAs. Note that the inputs are counted by considering tlie implicit input ai . bj which is not shown. The resulting CS array multiplier structure is shown on the right in figure 1 for clarity. Fig. 1. Basic template for constructing array multipliers. Now, the goal of an array multiplier is t o add the partial products from cells which occupy t.he same column in the cellular array structure shown in figure 1 . The order in which these partial products are added is not important, we only need t o ensure tha t only the products in the same colurrln are added (in addition t o the carry's generated from the cells in the adjacent column on right). Hence, one can exchange rows #3 and #7 as shown in figure 1. Cells in row #3 after moving t o row #7 are shown by cells shaded by circles. 'I'lne cells in row #7 after moving to row #3 are shown by dark colored cells. Now, we only need to ensure tha t carry's generated from next rows are correctly added, which may require extra cells. Let R = (ro, rl , rz, . . . , r ~ 1 ) be the set of indices which represents an ordering of success,ive additions of rows of partia,l products. Then , the orderi~ig given by ri = i for i = 0 , 1 , . . . N 1 expresses the LSB-first multiplier shown in figure 1. The MSB-first multiplier can also be expressed similarly by the ordering r . 1 N 1 i: for i = 0 , 1 , . . . , N 1. Clearly, there are N! ways t o construct carry save array multipliers. Each of these multipliers mays be constructed using propagation of carry in eit,her ripple form or CS form or a combination of these. This formulation is the basis of generating various architectures of interest which are evaluated for their switching activity performance in this paper. A. LSB-First Multipliers The LSB-first multiplier can be constructed either using the CS format shown in figure 1, or by using ripple carry structure. We will refer the former as LSB-first CS multiplier and the latter as the LSBfirst R P multiplier, respectively. LSB-first RP multiplier is the most well-known and widely used array structure for multiplication and is obtained from the cellular array of figure 1 by turning off the diagonal lines (by nlalting then1 dashed) and turning on the horizontal dashed lines (by making them continuous) which connect cell ci,, to ci , ,+I for all cells c i , j , i = 0 , 1, . . . , N 1 and j = N i, N i + 1, . . . , 2 N i 2 (right-most cell excepted) in the LSB-first CS multiplier of figure 1. The vector. merge row (row # N + 1) is no longer I-equired. The advantage of using CS format is the reduction in propagation delay through the multiplier. LSB-first RP multiplier has 30% longer critical path as compared to the LSB-first CS multiplier. Irl this work, we consider both since our objective is to highlight the switching characteristics of various array multipliers. An MSB-first multiplier place MSBs of A input at the top row positions as shown in figure 2. The main idea is to flip the cells in the cellular array of figure 1 along a horizontal axis such that row # i is moved to row #(N 1 i ) , for i = 0, 1 , . . . , N 1. This results in a MSB-first multiplier [ B ] . The multiplier can be const1:ucted by propagating the carry in either CS form, or can be ripple in a fashion identical to the LSB-first RP multiplier. The multiplier using CS format has been presented in [B ] for pipelining recursive digital filters. A major advantage of the MSB-first CS multiplier is that the delaj~ through vector merge stage can be reduced by taking advantage of the fact that the MSB-first array produces the MSBs before the LSBs. Hence, a carry-select structure can be constructed in the region occupied by cells ci,j for i > j to improve the vector merge delay. Consequently, MSB-first CS array multiplier can improve the speed of multiplication [ B ] . The observation that MSBs of product are available before the LSBs is fundamental to the construction of the MSB-first R P multiplier shown in figure 2(b). 111 contrast to a LSB-first RP multiplier, it has the same propagation delay as the LSR-first CS multiplier and offers an attractive alternative to it. C. Hybrid Multipliers A hybrid multiplier is obtained by any ordering of elements of R which is not monot,one. Note that there is only one monotonically increasing ordering of the elements of R and it leads to the LSB first structure. Siinilarly, the only monotonically decreasing ordering leads to the MSB first structure. Any ordering other than these two leads to a hybrid array multiplier. In this paper, we consider only two types for hybrid structures. The first structure places L consecutive LSB bits of operand A as L top most rows. This structure is shown on left in figure 3. The second structure places L consecutive MSB bits of operand A as L top rnost rows and is shown on right in figure 3. We will refer to the former as

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modified 32-Bit Shift-Add Multiplier Design for Low Power Application

Multiplication is a basic operation in any signal processing application. Multiplication is the most important one among the four arithmetic operations like addition, subtraction, and division. Multipliers are usually hardware intensive, and the main parameters of concern are high speed, low cost, and less VLSI area. The propagation time and power consumption in the multiplier are always high. ...

متن کامل

Design and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL

A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of func...

متن کامل

Design and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL

A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. &#10The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of fu...

متن کامل

Design and Analysis of a Spurious Switching Suppression Technique Equipped Low Power Multiplier with Hybrid Encoding Scheme

Multiplication is an arithmetic operation that is mostly used in Digital Signal Processing (DSP) and communication applications. Efficient implementation of the multipliers is required in many applications. The design and analysis of Spurious Switching Suppression Technique (SSST) equipped low power multiplier with hybrid encoding is presented in this paper. The proposed encoding technique redu...

متن کامل

Optimization of Low Power Using Fir Filter

In this paper we proposed a three stage pipelined finite-impulse response (FIR) filter, this FIR filter contains multipliers such as Hybrid multiplier, Booth multiplier algorithm and Array multiplier. In general, multiplication process consists of two parts as multiplicand and multiplier. According to the array multiplier, the numbers of partial products (PP) are equal to the number of bits in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999